Add per-attempt LLM spans under call-level retry (0050)#170
Merged
Conversation
Flip conformance.toml [proposals."0050"] partial -> implemented (since 0.15.0): the call-level-retry per-attempt span surface now ships. Document the openarmature.llm.attempt_index attribute and the per-attempt span behavior in the observability concepts page, plus notes that span enrichers receive LlmRetryAttemptEvent on the LLM span and that the bundled provider dispatches that internal event alongside the unchanged terminal events. Add the 0.15.0 changelog section covering this work and backfilling the 0061 detached-trace invocation span (which landed without an entry), plus the v0.60.0 -> v0.61.0 spec-pin bullet.
_build_llm_retry_attempt_event constructed a full LlmRetryAttemptEvent twice, repeating ~18 shared identity, scoping, and request-side fields across the success and failure branches. Hoist them into one base dict and splat it, leaving each branch to add only its outcome fields. No behavior change.
The OTel observer now renders the LLM span solely from the per-attempt LlmRetryAttemptEvent; terminal LlmCompletionEvent / LlmFailedEvent are ignored. Add a regression test feeding both terminal events and asserting zero openarmature.llm.complete spans, guarding against reintroducing the terminal-event span path. Also fix a stale docstring in _drive_llm_span_with_cached_tokens that still referenced "typed LlmCompletionEvent".
There was a problem hiding this comment.
Pull request overview
Implements proposal 0050’s observability §5.5 “per-attempt LLM spans” by introducing a new per-attempt internal event type and switching the OTel observer to render openarmature.llm.complete spans exclusively from that per-attempt event (one span per call-level retry attempt), while keeping the terminal LlmCompletionEvent / LlmFailedEvent as one-per-call for non-OTel consumers.
Changes:
- Add
LlmRetryAttemptEventand dispatch it once per in-call attempt fromOpenAIProvider.complete()(attempt latency excludes backoff). - Update OTel observer to create per-attempt
openarmature.llm.completespans fromLlmRetryAttemptEventand ignore terminal LLM events for span rendering; Langfuse ignores the new per-attempt event. - Update tests, conformance harness behavior, docs, conformance manifest, and changelog to reflect the per-attempt span contract and proposal 0050 being fully implemented.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_observability_otel.py | Updates unit tests to drive OTel spans from per-attempt events; adds regression coverage for ignoring terminal events; adds fixture-driven per-attempt span assertions. |
| tests/unit/test_llm_provider.py | Updates provider emission-shape test to expect per-attempt event followed by terminal event on success. |
| tests/conformance/test_observability.py | Wires new observability fixture; excludes per-attempt internal event from conformance collector stream. |
| tests/conformance/test_llm_provider.py | Notes that call-level retry fixtures’ per-attempt spans are asserted in OTel unit tests rather than the provider conformance harness. |
| tests/_helpers/typed_event.py | Adds helper for constructing LlmRetryAttemptEvent for tests. |
| src/openarmature/observability/otel/observer.py | Renders openarmature.llm.complete spans from LlmRetryAttemptEvent (one per attempt), and ignores terminal LLM events for span creation. |
| src/openarmature/observability/langfuse/observer.py | Explicitly ignores LlmRetryAttemptEvent to keep one Generation per call from terminal events. |
| src/openarmature/observability/correlation.py | Extends dispatch/observer event unions to include LlmRetryAttemptEvent. |
| src/openarmature/llm/providers/openai.py | Emits per-attempt events within the call-level retry loop via a callback, keeping terminal event behavior unchanged. |
| src/openarmature/graph/observer.py | Extends ObserverEvent union and docs to include the per-attempt internal event. |
| src/openarmature/graph/events.py | Adds the LlmRetryAttemptEvent dataclass and exports it from openarmature.graph.events. |
| docs/concepts/observability.md | Documents per-attempt spans, openarmature.llm.attempt_index, and enricher/consumer implications. |
| conformance.toml | Marks proposal 0050 as implemented since 0.15.0 with updated narrative. |
| CHANGELOG.md | Adds 0.15.0 entries describing per-attempt spans and detached-trace invocation span; records spec pin advance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR #170 CoPilot review: - Re-export LlmRetryAttemptEvent from the openarmature.graph package (import block + __all__), matching the sibling LlmCompletionEvent / LlmFailedEvent so the documented observer import path works. - Replace the brittle type(event).__name__ name match with an isinstance check in the conformance _TypedEventCollector; the filter_event_type string comparison stays as-is.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes proposal 0050 by implementing the call-level-retry per-attempt LLM span surface (observability §5.5 / llm-provider §7.1). 0050 shipped
partialin v0.14.0 (failure-isolation middleware and thecomplete(retry=...)loop); this branch lands the deferred piece: under call-level retry, the OTel observer now emits oneopenarmature.llm.completespan per attempt rather than one per call.What changed
LlmRetryAttemptEvent(frozen, exported fromopenarmature.graph) is dispatched once per in-call attempt, carrying that attempt's identity / scoping, request-side fields, and outcome (error_category is Nonediscriminates success from failure).OpenAIProvider.complete()dispatches oneLlmRetryAttemptEventper attempt, including the single attempt of a no-retry call (at index 0), with per-attempt latency that excludes backoff. The terminalLlmCompletionEvent/LlmFailedEventare unchanged: still exactly one per call.openarmature.llm.completespan(s) solely fromLlmRetryAttemptEvent, each taggedopenarmature.llm.attempt_index(0..N-1). A failed intermediate attempt carries ERROR plus the §4 category plus the request-side attributes; the final (or single) attempt carries the full §5.5 response surface. The two terminal events no longer drive the OTel span; they stay on the queue for the Langfuse mapping and payload / latency consumers. This collapses the previous completion and failed handlers into one.LlmRetryAttemptEvent).conformance.tomlflips 0050 toimplemented(since 0.15.0); the observability concepts page documents the attribute, the per-attempt span behavior, and the enricher / consuming-event implications; a0.15.0changelog section lands, also backfilling the 0061 detached-trace span entry.Design notes
LlmRetryAttemptEventis python-internal, not a spec-normative event type. The per-attempt span contract is the already-accepted observability §5.5 (one span per attempt,openarmature.llm.attempt_index0..N-1); §5.5 does not pin which internal event the observer renders from, so making this event the sole span source is an implementation choice. The guardrail: each per-attempt span carries the full §5.5 attribute surface, verified by fixtures 057 and 016-021 / 040-042 staying green. For Langfuse, terminal-Generation-per-call is the intended shape; §8 is currently silent on call-level retry, and a spec-side clarification to pin it is tracked (non-blocking).Testing
Notes
0.15.0changelog date is tentative pending the release tag.conformance.toml's proposal note style; not in this PR.